Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro
no journal, ,
In the "CityLBM" project at JAEA, a real-time AMR (adaptive mesh refinement)-based urban wind prediction code was developed. Towards the next generation of CityLBM code, ensemble simulations are needed to improve the reliability of the prediction. For this purpose, the memory usage should be shrunk into a single node or 4-16 GPUs per simulation. To reduce the memory usage and accelerate data communication in the AMR code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. This approach enables easy parallel GPU implementation, because the access to Unified Memory is automatically managed via HBM2 (self GPU) or NVLink (neighbor GPU). We implemented multi-GPU calculations for a 3D diffusion equation and a lattice Boltzmann equation on uniform mesh, and tested weak/strong scalability and the performance of NVLink.